13 research outputs found

    Human exploration of complex knowledge spaces

    Get PDF
    Driven by need or curiosity, as humans we constantly act as information seekers. Whenever we work, study, play, we naturally look for information in spaces where pieces of our knowledge and culture are linked through semantic and logic relations. Nowadays, far from being just an abstraction, these information spaces are complex structures widespread and easily accessible via techno-systems: from the whole World Wide Web to the paramount example of Wikipedia. They are all information networks. How we move on these networks and how our learning experience could be made more efficient while exploring them are the key questions investigated in the present thesis. To this end concepts, tools and models from graph theory and complex systems analysis are borrowed to combine empirical observations of real behaviours of users in knowledge spaces with some theoretical findings of cognitive science research. It is investigated how the knowledge space structure can affect its own exploration in learning-type tasks, and how users do typically explore the information networks, when looking for information or following some learning paths. The research approach followed is exploratory and moves along three main lines of research. Enlarging a previous work in algorithmic education, the first contribution focuses on the topological properties of the information network and how they affect the \emph{efficiency} of a simulated learning exploration. To this end a general class of algorithms is introduced that, standing on well-established findings on educational scheduling, captures some of the behaviours of an individual moving in a knowledge space while learning. In exploring this space, learners move along connections, periodically revisiting some concepts, and sometimes jumping on very distant ones. To investigate the effect of networked information structures on the dynamics, both synthetic and real-world graphs are considered, such as subsections of Wikipedia and word-association graphs. The existence is revealed of optimal topological structures for the defined learning dynamics. They feature small-world and scale-free properties with a balance between the number of hubs and of the least connected items. Surprisingly the real-world networks analysed turn out to be close to optimality. To uncover the role of semantic content of the bit of information to be learned in a information-seeking tasks, empirical data on user traffic logs in the Wikipedia system are then considered. From these, and by means of first-order Markov chain models, some users paths over the encyclopaedia can be simulated and treated as proxies for the real paths. They are then analysed in an abstract semantic level, by mapping the individual pages into points of a semantic reduced space. Recurrent patterns along the walks emerge, even more evident when contrasted with paths originated in information-seeking goal oriented games, thus providing some hints about the unconstrained navigation of users while seeking for information. Still, different systems need to be considered to evaluate longer and more constrained and structured learning dynamics. This is the focus of the third line of investigation, in which learning paths are extracted from advances scientific textbooks and treated as they were walks suggested by their authors throughout an underlying knowledge space. Strategies to extract the paths from the textbooks are proposed, and some preliminary results are discussed on their statistical properties. Moreover, by taking advantages of the Wikipedia information network, the Kauffman theory of adjacent possible is formalized in a learning context, thus introducing the adjacent learnable to refer to the part of the knowledge space explorable by the reader as she learns new concepts by following the suggested learning path. Along this perspective, the paths are analysed as particular realizations of the knowledge space explorations, thus allowing to quantitatively contrast different approaches to education

    Optimal learning paths in information networks

    Get PDF
    Each sphere of knowledge and information could be depicted as a complex mesh of correlated items. By properly exploiting these connections, innovative and more efficient navigation strategies could be defined, possibly leading to a faster learning process and an enduring retention of information. In this work we investigate how the topological structure embedding the items to be learned can affect the efficiency of the learning dynamics. To this end we introduce a general class of algorithms that simulate the exploration of knowledge/information networks standing on well-established findings on educational scheduling, namely the spacing and lag effects. While constructing their learning schedules, individuals move along connections, periodically revisiting some concepts, and sometimes jumping on very distant ones. In order to investigate the effect of networked information structures on the proposed learning dynamics we focused both on synthetic and real-world graphs such as subsections of Wikipedia and word-association graphs. We highlight the existence of optimal topological structures for the simulated learning dynamics whose efficiency is affected by the balance between hubs and the least connected items. Interestingly, the real-world graphs we considered lead naturally to almost optimal learning performances

    Search strategies of Wikipedia readers

    Get PDF
    The quest for information is one of the most common activity of human beings. Despite the the impressive progress of search engines, not to miss the needed piece of information could be still very tough, as well as to acquire specific competences and knowledge by shaping and following the proper learning paths. Indeed, the need to find sensible paths in information networks is one of the biggest challenges of our societies and, to effectively address it, it is important to investigate the strategies adopted by human users to cope with the cognitive bottleneck of finding their way in a growing sea of information. Here we focus on the case of Wikipedia and investigate a recently released dataset about users’ click on the English Wikipedia, namely the English Wikipedia Clickstream. We perform a semantically charged analysis to uncover the general patterns followed by information seekers in the multi-dimensional space of Wikipedia topics/categories. We discover the existence of well defined strategies in which users tend to start from very general, i.e., semantically broad, pages and progressively narrow down the scope of their navigation, while keeping a growing semantic coherence. This is unlike strategies associated to tasks with predefined search goals, namely the case of the Wikispeedia game. In this case users first move from the ‘particular’ to the ‘universal’ before focusing down again to the required target. The clear picture offered here represents a very important stepping stone towards a better design of information networks and recommendation strategies, as well as the construction of radically new learning paths

    Datasets under consideration.

    No full text
    <p>In (A) we illustrate the English Wikipedia Clickstream dataset. The 9 different external sources plus the MainPage are illustrated with the fraction of flux outgoing from them. The paths we considered in our analysis start from one of the 9 sources to randomly walking over the Wikipedia articles accordingly to the transition counts provided by the dataset. (B) Two examples of paths followed by players of the Wikispeedia game, whose task was that of navigating on a reduced version of Wikipedia from a given starting page to a given target one (from <i>House</i> to <i>Electric_Field</i> in the example).</p

    Paths generated from the external source <i>google</i>: averages.

    No full text
    <p>The 10<sup>7</sup> paths simulated with <i>google</i> as source were split by lengths. For each fixed length <i>l</i>, we computed the averages of the following quantities over all the nodes(pairs) at <i>k</i> steps(jumps) to the end: (A) the average norm , (B) the entropy , (C) the distance and (E) the similarity between all the pairs of nodes consecutively visited along each path, respectively and , (D) the distance and (F) the similarity between every node visited and the ending node along each path, i.e. and . The error bars display the standard errors of the means. Each color refers to a path length, from 3 (blue) to 9 (light green).</p

    Rescaled averages over the simulated paths.

    No full text
    <p>In this panel we report the same data of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170746#pone.0170746.g004" target="_blank">Fig 4</a> (left column) after rescaling. The walks lengths are normalized to 1. The corresponding averages for step of the different measures (A)-(F) are rescaled with the mean value of the same measures evaluated over the whole set of nodes belonging to paths with the same length. The averages used to rescale the data are displayed in Fig D in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170746#pone.0170746.s001" target="_blank">S1 File</a>. In the central and right columns similarly processed data are reported which refer respectively to a semantically uncorrelated model based on the <i>google</i> paths and to the Wikispeedia paths. Each color refers to a path length, from 3 (blue) to 9 (light green). The standard error of the means are reported.</p

    Example illustrating the construction of the topical vector for the Isaac Newton article.

    No full text
    <p>For the Isaac Newton page one first considers the list of parents categories (panel A). For each category, one identifies the most-representative-topics (panel B), selecting the ones from which the depth of the category in the categories tree is minimal. For each page, we consider the whole list of most-representative-topics and corresponding depths (panel C). For instance the category <i>copernican_revolution</i> has the smallest depth (equal to 3) in the tree of the topic <i>SCIENCE</i>. The vector representation of the coordinates of the main topics is now obtained by weighting each topic with the inverse of the minimal depth computed above (panel D). For instance the topic <i>SCIENCE</i> appears in the topical vector with weight 1/2.</p

    Similarity scores between sources.

    No full text
    <p>For the two observables norm (left panel) and entropy (right panel), we report the matrix of similarities score between all the sources and Wikispeedia. The score is defined by <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170746#pone.0170746.e033" target="_blank">Eq (6)</a>. For each pair of sources, the unrescaled averages values of the observable are considered (as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170746#pone.0170746.g004" target="_blank">Fig 4</a>). Then, for each path length between 4 and 9, the Spearman correlation coefficient is computed between the averaged values of the observable. The final score is the obtained after averaging over all the lengths.</p

    Distributions of page norms (left) and entropies (right).

    No full text
    <p>The distributions are computed over the set of all pages for which a vector representation was derived. For both norm and entropy, in the boxes some exemplar pages are reported to illustrate the meaning of extreme values.</p
    corecore